Clinical medicine journals lag behind science journals with regards to “microbiota sequence” data availability

1 School of Biomedical Sciences and Pharmacy, College of Health, Medicine and Wellbeing, University of Newcastle, Newcastle, Australia 2 NHMRC Centre for Research Excellence in Digestive Health, University of Newcastle, Newcastle, Australia 3 School of Medicine and Public Health, College of Health, Medicine and Wellbeing, University of Newcastle, Australia 4 Hunter Medical Research Institute, Newcastle, Australia


BACKGROUND
Microbiota sequencing has received much greater attention over the past 10 years as a result of decreasing sequencing costs and advancing analysis capabilities. 1-3 Despite this, cohort microbiota studies remain beyond the capacity of many researchers with novel hypotheses for a variety of reasons, including access to funding or access to disease cohorts of interest. Despite these limitations, it is possible to collate primary microbiota data from published datasets and conduct secondary analysis to address unique research questions. This approach has been used previously to recover new findings, validate results, and/or increase a studies power. [3][4][5] The method is most effective when the original publications cite accession numbers for their sequence data deposited in public repositories. Here we report the extent of journals enforcing the inclusion of data availability statements and as a result how we as a Science community are lagging with the public deposit of sequence data hindering scientific progress. Our research team recently posed a research question and related hypothesis that we believed had the potential This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2021 The Authors. Clinical and Translational Medicine published by John Wiley & Sons Australia, Ltd on behalf of Shanghai Institute of Clinical Bioinformatics to be answered from secondary analysis of published data from studies on the gut microbiota of functional gastrointestinal disorders. However, of the 24 studies identified as having related data, only five had published accession numbers for the associated datasets. Following attempts to contact the remaining 19 corresponding authors, one study author responded with the accession number for the publication date. The remaining corresponding authors have not responded after 5 months despite the fact that six out of 16 of these publishing journals have data availability requirements for published studies. This led us to question how stringently and effectively data sharing requirements are enforced in the field of microbiome research.

COMMENTARY
To investigate the exact relationship of strict inclusion data availability statements, journal impact factors and journal quartile information, we generated a spreadsheet of journal titles, and their reported impact factors and quartiles from 2020, from categories likely to publish F I G U R E 1 Comparison of journal impact factor with the requirement for submission of (A) sequencing data, 59% of the 95 journals screened did not require submission of sequencing data with the publication, (B) microarray data submission and inclusion of a (C) data sharing statement. Significance level (Unpaired t-test): ns, not significant; *p-value < 0.05; ****p-value < 0.0001 research related to the human microbiota, using InCites Journal Citation Reports (https://jcr.clarivate.com). From this extensive list of journals (n = 4123), we selected 150 titles for further analysis. The selected journals were chosen based on microbiota research was within the scope of the journal. Where possible we selected a minimum of two journals for each integer impact factor to ensure that the final selection was representative of the spectrum of impact factors. The final selection consisted largely of journals with a focus within the fields of microbiology, gastroenterology, clinical medicine and discovery science. Of the resulting 150 journal titles, a further 55 were removed because they had not published research articles reporting microbiota sequencing in the past 5 years (Table S1). Two independent reviewers examined the author guidelines for the remaining 95 articles, to assess if journals required the submission of sequencing, microarray data, and inclusion of a data availability statement for publication. Curated data was graphed and analysed using GraphPad Prism 9. Statistics were conducted with STATA v.15.
Of the selected journals 58% (n = 55) were classified as science type journals while the remainder were clinical medicine journals (n = 40). Overall, the median impact factor in 2020 was 7.313 (range: 0.747-91.245). The average impact factor of Science journals was 11.87 and for clinical medicine journals, it was 19.95. The median impact by quartile was first quartile, 17.199; second quartile, 4.181; third quartile, 3.267 and fourth quartile, 2.112. Comparing the impact factors by submission of sequencing data revealed a statistically significant difference (17.91 vs. 12.53, p = 0.14; Figure 1A) with higher impact factor journals more likely to request sequencing data. The same also occurred for submission of microarray data (20.65 vs. 7.503, p = < 0.0001; Figure 1B), and data sharing statements (16.11 vs. 7.797, p = 0.0357; Figure 1C).

CONCLUSION
Our findings demonstrate that Discovery Science-based journals and journals with higher impact factors are more likely to request microbiome data for public access. We propose that access to published data (microbiota sequence or other) should be a standard mandatory requirement for every journal to facilitate reproducibility and the opportunity for novel findings.

A C K N O W L E D G E M E N T S
We acknowledge the support of funding from the National Health and Medical Research Council (NHMRC) for the Centre for Research Excellence in Digestive Health and NHMRC Ideas grant. Jennifer Pryor1,2,4, Guy D. Eslick2,3,4, Nicholas J. Tal-ley2, 3,4,Kerith Duncanson2,3,4,Simon Keely1,2,4,Emily C. Hoedt

C O N F L I C T O F I N T E R E S T
Jennifer Pryor, Guy D. Eslick and Emily C. Hoedt declare that they have no conflict of interest. Nicholas J. Talley F I G U R E 2 (A) Comparison of impact factors of journals classified as either discovery science (n = 55) or clinical medicine type journals (n = 40) and whether these journals have requirements for submission of (B) sequencing data, 59% of the 95 journals screened did not require submission of sequencing data with publication (p = .0015), (C) microarray data submission and inclusion of (A) and (D) data sharing statement. Significance level (Unpaired t-test): ns, not significant; **p-value < 0.01 F I G U R E 3 Comparison of journal quartile with a requirement for submission of (A) sequencing data, (B) microarray data and (C) data availability statement. Significance level (Unpaired t-test): *p-value < 0.05; ****p-value < 0.0001 reports personal fees from Allakos, Aviro Health, from Antara Life Sciences, Arlyx, Bayer, Danone, Planet Innovation, Takeda, Viscera Labs, twoXAR, Viscera Labs, Dr Falk Pharma, Censa, Cadila Pharmaceuticals, Progenity Inc, Sanofi-aventis, Glutagen, ARENA Pharmaceuticals, IsoThrive, BluMaiden, HVN National Science Challenge, non-financial support from HVN National Science Challenge, New Zealand, outside the submitted work. Nicholas J. Talley